20 research outputs found

    Weighted Distance-Based Models for Ranking Data Using the R Package rankdist

    Get PDF
    rankdist is a recently developed R package which implements various distance-based ranking models. These models capture the occurring probability of rankings based on the distances between them. The package provides a framework for fitting and evaluating finite mixture of distance-based models. This paper also presents a new probability model for ranking data based on a new notion of weighted Kendall distance. The new model is flexible and more interpretable than the existing models. We show that the new model has an analytic form of the probability mass function and the maximum likelihood estimates of the model parameters can be obtained efficiently even for ranking involving a large number of objects

    TRIAGE: Characterizing and auditing training data for improved regression

    Full text link
    Data quality is crucial for robust machine learning algorithms, with the recent interest in data-centric AI emphasizing the importance of training data characterization. However, current data characterization methods are largely focused on classification settings, with regression settings largely understudied. To address this, we introduce TRIAGE, a novel data characterization framework tailored to regression tasks and compatible with a broad class of regressors. TRIAGE utilizes conformal predictive distributions to provide a model-agnostic scoring method, the TRIAGE score. We operationalize the score to analyze individual samples' training dynamics and characterize samples as under-, over-, or well-estimated by the model. We show that TRIAGE's characterization is consistent and highlight its utility to improve performance via data sculpting/filtering, in multiple regression settings. Additionally, beyond sample level, we show TRIAGE enables new approaches to dataset selection and feature acquisition. Overall, TRIAGE highlights the value unlocked by data characterization in real-world regression applicationsComment: Presented at NeurIPS 202

    Neural Laplace Control for Continuous-time Delayed Systems

    Full text link
    Many real-world offline reinforcement learning (RL) problems involve continuous-time environments with delays. Such environments are characterized by two distinctive features: firstly, the state x(t) is observed at irregular time intervals, and secondly, the current action a(t) only affects the future state x(t + g) with an unknown delay g > 0. A prime example of such an environment is satellite control where the communication link between earth and a satellite causes irregular observations and delays. Existing offline RL algorithms have achieved success in environments with irregularly observed states in time or known delays. However, environments involving both irregular observations in time and unknown delays remains an open and challenging problem. To this end, we propose Neural Laplace Control, a continuous-time model-based offline RL method that combines a Neural Laplace dynamics model with a model predictive control (MPC) planner--and is able to learn from an offline dataset sampled with irregular time intervals from an environment that has a inherent unknown constant delay. We show experimentally on continuous-time delayed environments it is able to achieve near expert policy performance.Comment: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics (AISTATS) 2023, Valencia, Spain. PMLR: Volume 206. Copyright 2023 by the author(s

    Retrospective cohort study of admission timing and mortality following COVID-19 infection in England.

    Get PDF
    OBJECTIVES: We investigated whether the timing of hospital admission is associated with the risk of mortality for patients with COVID-19 in England, and the factors associated with a longer interval between symptom onset and hospital admission. DESIGN: Retrospective observational cohort study of data collected by the COVID-19 Hospitalisation in England Surveillance System (CHESS). Data were analysed using multivariate regression analysis. SETTING: Acute hospital trusts in England that submit data to CHESS routinely. PARTICIPANTS: Of 14 150 patients included in CHESS until 13 May 2020, 401 lacked a confirmed diagnosis of COVID-19 and 7666 lacked a recorded date of symptom onset. This left 6083 individuals, of whom 15 were excluded because the time between symptom onset and hospital admission exceeded 3 months. The study cohort therefore comprised 6068 unique individuals. MAIN OUTCOME MEASURES: All-cause mortality during the study period. RESULTS: Timing of hospital admission was an independent predictor of mortality following adjustment for age, sex, comorbidities, ethnicity and obesity. Each additional day between symptom onset and hospital admission was associated with a 1% increase in mortality risk (HR 1.01; p<0.005). Healthcare workers were most likely to have an increased interval between symptom onset and hospital admission, as were people from Black, Asian and minority ethnic (BAME) backgrounds, and patients with obesity. CONCLUSION: The timing of hospital admission is associated with mortality in patients with COVID-19. Healthcare workers and individuals from a BAME background are at greater risk of later admission, which may contribute to reports of poorer outcomes in these groups. Strategies to identify and admit patients with high-risk and those showing signs of deterioration in a timely way may reduce the consequent mortality from COVID-19, and should be explored

    Clairvoyance: A Pipeline Toolkit for Medical Time Series

    Full text link
    Time-series learning is the bread and butter of data-driven *clinical decision support*, and the recent explosion in ML research has demonstrated great potential in various healthcare settings. At the same time, medical time-series problems in the wild are challenging due to their highly *composite* nature: They entail design choices and interactions among components that preprocess data, impute missing values, select features, issue predictions, estimate uncertainty, and interpret models. Despite exponential growth in electronic patient data, there is a remarkable gap between the potential and realized utilization of ML for clinical research and decision support. In particular, orchestrating a real-world project lifecycle poses challenges in engineering (i.e. hard to build), evaluation (i.e. hard to assess), and efficiency (i.e. hard to optimize). Designed to address these issues simultaneously, Clairvoyance proposes a unified, end-to-end, autoML-friendly pipeline that serves as a (i) software toolkit, (ii) empirical standard, and (iii) interface for optimization. Our ultimate goal lies in facilitating transparent and reproducible experimentation with complex inference workflows, providing integrated pathways for (1) personalized prediction, (2) treatment-effect estimation, and (3) information acquisition. Through illustrative examples on real-world data in outpatient, general wards, and intensive-care settings, we illustrate the applicability of the pipeline paradigm on core tasks in the healthcare journey. To the best of our knowledge, Clairvoyance is the first to demonstrate viability of a comprehensive and automatable pipeline for clinical time-series ML
    corecore